action pattern
Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning
Liu, Shijie, Cullen, Andrew C., Montague, Paul, Erfani, Sarah, Rubinstein, Benjamin I. P.
The current state-of-the-art backdoor attacks against Reinforcement Learning (RL) rely upon unrealistically permissive access models, that assume the attacker can read (or even write) the victim's policy parameters, observations, or rewards. In this work, we question whether such a strong assumption is required to launch backdoor attacks against RL. To answer this question, we propose the \underline{S}upply-\underline{C}h\underline{a}in \underline{B}ackdoor (SCAB) attack, which targets a common RL workflow: training agents using external agents that are provided separately or embedded within the environment. In contrast to prior works, our attack only relies on legitimate interactions of the RL agent with the supplied agents. Despite this limited access model, by poisoning a mere $3\%$ of training experiences, our attack can successfully activate over $90\%$ of triggered actions, reducing the average episodic return by $80\%$ for the victim. Our novel attack demonstrates that RL attacks are likely to become a reality under untrusted RL training supply-chains.
Growable and Interpretable Neural Control with Online Continual Learning for Autonomous Lifelong Locomotion Learning Machines
Srisuchinnawong, Arthicha, Manoonpong, Poramate
Continual locomotion learning faces four challenges: incomprehensibility, sample inefficiency, lack of knowledge exploitation, and catastrophic forgetting. Thus, this work introduces Growable Online Locomotion Learning Under Multicondition (GOLLUM), which exploits the interpretability feature to address the aforementioned challenges. GOLLUM has two dimensions of interpretability: layer-wise interpretability for neural control function encoding and column-wise interpretability for robot skill encoding. With this interpretable control structure, GOLLUM utilizes neurogenesis to unsupervisely increment columns (ring-like networks); each column is trained separately to encode and maintain a specific primary robot skill. GOLLUM also transfers the parameters to new skills and supplements the learned combination of acquired skills through another neural mapping layer added (layer-wise) with online supplementary learning. On a physical hexapod robot, GOLLUM successfully acquired multiple locomotion skills (e.g., walking, slope climbing, and bouncing) autonomously and continuously within an hour using a simple reward function. Furthermore, it demonstrated the capability of combining previous learned skills to facilitate the learning process of new skills while preventing catastrophic forgetting. Compared to state-of-the-art locomotion learning approaches, GOLLUM is the only approach that addresses the four challenges above mentioned without human intervention. It also emphasizes the potential exploitation of interpretability to achieve autonomous lifelong learning machines.
MARL-OT: Multi-Agent Reinforcement Learning Guided Online Fuzzing to Detect Safety Violation in Autonomous Driving Systems
Autonomous Driving Systems (ADSs) are safety-critical, as real-world safety violations can result in significant losses. Rigorous testing is essential before deployment, with simulation testing playing a key role. However, ADSs are typically complex, consisting of multiple modules such as perception and planning, or well-trained end-to-end autonomous driving systems. Offline methods, such as the Genetic Algorithm (GA), can only generate predefined trajectories for dynamics, which struggle to cause safety violations for ADSs rapidly and efficiently in different scenarios due to their evolutionary nature. Online methods, such as single-agent reinforcement learning (RL), can quickly adjust the dynamics' trajectory online to adapt to different scenarios, but they struggle to capture complex corner cases of ADS arising from the intricate interplay among multiple vehicles. Multi-agent reinforcement learning (MARL) has a strong ability in cooperative tasks. On the other hand, it faces its own challenges, particularly with convergence. This paper introduces MARL-OT, a scalable framework that leverages MARL to detect safety violations of ADS resulting from surrounding vehicles' cooperation. MARL-OT employs MARL for high-level guidance, triggering various dangerous scenarios for the rule-based online fuzzer to explore potential safety violations of ADS, thereby generating dynamic, realistic safety violation scenarios. Our approach improves the detected safety violation rate by up to 136.2% compared to the state-of-the-art (SOTA) testing technique.
Exploring Large Language Models as a Source of Common-Sense Knowledge for Robots
Ocker, Felix, Deigmöller, Jörg, Eggert, Julian
Service robots need common-sense knowledge to help humans in everyday situations as it enables them to understand the context of their actions. However, approaches that use ontologies face a challenge because common-sense knowledge is often implicit, i.e., it is obvious to humans but not explicitly stated. This paper investigates if Large Language Models (LLMs) can fill this gap. Our experiments reveal limited effectiveness in the selective extraction of contextual action knowledge, suggesting that LLMs may not be sufficient on their own. However, the large-scale extraction of general, actionable knowledge shows potential, indicating that LLMs can be a suitable tool for efficiently creating ontologies for robots. This paper shows that the technique used for knowledge extraction can be applied to populate a minimalist ontology, showcasing the potential of LLMs in synergy with formal knowledge representation.
Building a Timeline Network for Evacuation in Earthquake Disaster
Nguyen, The Minh (The University of Electro-Communications) | Kawamura, Takahiro (The University of Electro-Communications) | Tahara, Yasuyuki (The University of Electro-Communications) | Ohsuga, Akihiko (The University of Electro-Communications)
In this paper, we propose an approach that automatically extract users’ activities in sentences retrieved from Twitter. We then design a timeline action networkbased on Web Ontology Language (OWL). By using the proposed activity extraction approach, we can automatically collect data for the action network. Finally, we propose a novel action-based collaborative filtering, which predicts missing activity data, in order to complement this timeline network. Moreover, with a combination of collaborative filtering and natural language processing (NLP), our method can deal with minority actions such as successful actions. Based on evaluation of tweets which related to the massive Tohoku earthquake,we indicated that our timeline action network can provide useful action patterns in real-time. Not only earthquake disaster, our research can also be applied to other disasters and business models, such as typhoon,travel, marketing, etc.